import pandas as pd
import numpy as np
import plotly.express as px
from itables import show
import country_converter as cocoLearning: Emissions Data - Table, Map, and Chart
Introduction
In this exercise, you will work with a CO2 emissions dataset downloaded from Gapminder and produce a report with three tabs: a data table, a line chart, and a choropleth map.
The goal is to roughly replicate the Our World in Data visualization page on consumption-based CO2 emissions.
Be sure to view that page to get an idea of the final product.
Setup
You should have forked and cloned this repository to your local machine.
Now, create and select a virtual environment in VSCode.
Install the following packages:
- pandas
- plotly
- itables
- ipykernel
- jupyter
- country_converter
Download the data from Gapminder by selecting: Environment > Emissions > CO2 Total emissions, then downloading the CSV file into a
datafolder in your repository.
Data Import
Run the following code to import the necessary libraries:
Load in your dataset from gapminder below. View it in your data viewer to get an idea of the structure.
# Load the data
emissions = pd.read_csv(r"C:\Users\luisl\Documents\GitHub\co2_emissions_demo\data\co2_cons.csv")emissions.columnsIndex(['country', '1800', '1801', '1802', '1803', '1804', '1805', '1806',
'1807', '1808',
...
'2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
'2022'],
dtype='object', length=224)
Initial Cleaning
In this dataset, some values are given in thousands, with a “k” used to represent the thousands. This will cause problems when we try to make these columns numeric. So we need to clean this. We’ll do this for you, but pay close attention as you might need it for your final project.
First, let’s see the issue:
emissions.query("country == 'China'")[["country", "2020", "2021", "2022"]]| country | 2020 | 2021 | 2022 | |
|---|---|---|---|---|
| 32 | China | 9980.0 | 10.6k | 10.5k |
Notice the letter “k” at the end of “10.6k” as an example.
We can remove the “k” and multiply those values by 1000 with the following code:
for col in ["2021", "2022"]:
has_k = emissions[col].str.contains("k")
values = emissions[col].str.replace("k", "")
emissions[col] = np.where(has_k, values.astype(float) * 1000, values.astype(float))And check that it worked:
emissions.query("country == 'China'")[["country", "2020", "2021", "2022"]]| country | 2020 | 2021 | 2022 | |
|---|---|---|---|---|
| 32 | China | 9980.0 | 10600.0 | 10500.0 |
Table Section
Our goal is to create a table showing emissions for a few selected years and calculate absolute and relative changes.
- Subset the data to include
Country,2000, and2022columns only. - Calculate an “Absolute Change” column (
emissions_2022 - emissions_2000). - Calculate a “Relative Change” column (
(emissions_2022 - emissions_2000) / emissions_2000).
# Subset the data to include `country`, `2000`, and `2022` columns only.
table_df = emissions[["country", "2000", "2022"]].copy()
# Calculate absolute change as the difference between 2022 and 2000
table_df["Absolute Change"] = table_df["2022"] - table_df["2000"]
# Calculate relative change as the absolute change divided by the 2000 emissions, then multiplied by 100
table_df["Relative Change"] = (table_df["Absolute Change"] / table_df["2000"])*100
# Round to 0 decimal places, and add a % sign to the relative change
table_df["Relative Change"] = table_df["Relative Change"].round(0).astype(str) + "%"Now we can display this as an interactive table with itables:
show(table_df)| country | 2000 | 2022 | Absolute Change | Relative Change |
|---|---|---|---|---|
| Loading ITables v2.2.4 from the internet... (need help?) |
Chart Section
Our goal is to create a line chart from 1990 to 2022 for a few selected countries.
- Melt the data so that years become rows.
- Filter from 1990 to 2022 only.
- Choose 5 countries of your choice.
- Create a line chart with Plotly Express.
# Melt the dataset. Your id_vars should be "country", your var_name should be "year" and your value_name should be "emissions".
emissions_long = emissions.melt(
id_vars="country", var_name="year", value_name="emissions"
)
# Convert year to numeric using pd.to_numeric
emissions_long["year"] = pd.to_numeric(emissions_long["year"])
emissions_long
# Convert emissions to numeric using pd.to_numeric
pd.to_numeric(emissions_long["emissions"].astype(str).str.replace("−", "-"))
# Query for years between 1990 and 2022 (that is 1990, 1991, ..., 2022)
emissions_long_1990_2022 = emissions_long.query("1990 <= year <= 2022")
# Query for 5 countries (adjust these to any countries you like)
countries_of_interest = emissions_long_1990_2022.query(
"country in ['Peru', 'Argentina', 'Australia', 'Nigeria', 'España']"
)
# Create line chart. Year should be on the x-axis, emissions on the y-axis, and color should be by country.
fig_chart = px.line(countries_of_interest, x="year", y="emissions", color="country")
fig_chartEvaluate missing values
emissions_long.isna().sum()country 0
year 0
emissions 0
dtype: int64
Mapping Section
This part is done for you.
Goal: Create a choropleth map showing global emissions from 1990 to 2022.
This will be animated by year.
- Ensure each country has a 3-letter ISO code. We’ll use
country_converterfor that. - Create a map with
px.choroplethand useanimation_frameto show changes over time.
# Convert country names to ISO3 codes
emissions_long_1990_2022["country_code"] = coco.convert(
emissions_long_1990_2022["country"], to="ISO3"
)
fig_map = px.choropleth(
emissions_long_1990_2022,
locations="country_code",
color="emissions",
hover_name="country",
animation_frame="year",
title="Global CO2 Emissions (1990-2022)",
)
fig_map.show()Dynamic tabset
Below, Results into a tabbed interface.
show(table_df)| country | 2000 | 2022 | Absolute Change | Relative Change |
|---|---|---|---|---|
| Loading ITables v2.2.4 from the internet... (need help?) |
fig_chart.show()fig_map.show()# Query for 5 countries (adjust these to any countries you like)
countries_of_interest = ["Argentina", "Peru", "Bolivia", "Ecuador", "Brazil"]
country_data_dict = {}
for country in countries_of_interest:
country_data = emissions_long_1990_2022.query("country == @country")
country_data_dict[country] = country_data
# Print each country's data
for country, data in country_data_dict.items():
print(f"Data for {country}:")
print(data)
print("\n")Data for Argentina:
country year emissions country_code
36865 Argentina 1990 111.0 ARG
37059 Argentina 1991 120.0 ARG
37253 Argentina 1992 127.0 ARG
37447 Argentina 1993 122.0 ARG
37641 Argentina 1994 126.0 ARG
37835 Argentina 1995 129.0 ARG
38029 Argentina 1996 136.0 ARG
38223 Argentina 1997 143.0 ARG
38417 Argentina 1998 146.0 ARG
38611 Argentina 1999 148.0 ARG
38805 Argentina 2000 140.0 ARG
38999 Argentina 2001 129.0 ARG
39193 Argentina 2002 81.7 ARG
39387 Argentina 2003 112 ARG
39581 Argentina 2004 133 ARG
39775 Argentina 2005 140 ARG
39969 Argentina 2006 150 ARG
40163 Argentina 2007 156.0 ARG
40357 Argentina 2008 172.0 ARG
40551 Argentina 2009 157.0 ARG
40745 Argentina 2010 172.0 ARG
40939 Argentina 2011 189 ARG
41133 Argentina 2012 189 ARG
41327 Argentina 2013 198 ARG
41521 Argentina 2014 184.0 ARG
41715 Argentina 2015 199.0 ARG
41909 Argentina 2016 186.0 ARG
42103 Argentina 2017 190.0 ARG
42297 Argentina 2018 175.0 ARG
42491 Argentina 2019 166.0 ARG
42685 Argentina 2020 158.0 ARG
42879 Argentina 2021 171.0 ARG
43073 Argentina 2022 175.0 ARG
Data for Peru:
country year emissions country_code
36996 Peru 1990 23.7 PER
37190 Peru 1991 23.5 PER
37384 Peru 1992 23.9 PER
37578 Peru 1993 26.4 PER
37772 Peru 1994 26.7 PER
37966 Peru 1995 28.0 PER
38160 Peru 1996 27.4 PER
38354 Peru 1997 30.5 PER
38548 Peru 1998 31.4 PER
38742 Peru 1999 32.7 PER
38936 Peru 2000 32.2 PER
39130 Peru 2001 29.9 PER
39324 Peru 2002 30.9 PER
39518 Peru 2003 29.3 PER
39712 Peru 2004 33.1 PER
39906 Peru 2005 37.4 PER
40100 Peru 2006 32.3 PER
40294 Peru 2007 39.6 PER
40488 Peru 2008 49.9 PER
40682 Peru 2009 53.0 PER
40876 Peru 2010 61.7 PER
41070 Peru 2011 57.6 PER
41264 Peru 2012 57.8 PER
41458 Peru 2013 57.3 PER
41652 Peru 2014 61.5 PER
41846 Peru 2015 61.6 PER
42040 Peru 2016 62.2 PER
42234 Peru 2017 62.4 PER
42428 Peru 2018 64.4 PER
42622 Peru 2019 66.6 PER
42816 Peru 2020 57.3 PER
43010 Peru 2021 66.3 PER
43204 Peru 2022 68.3 PER
Data for Bolivia:
country year emissions country_code
36882 Bolivia 1990 9.07 BOL
37076 Bolivia 1991 9.56 BOL
37270 Bolivia 1992 10.2 BOL
37464 Bolivia 1993 10.7 BOL
37658 Bolivia 1994 11.4 BOL
37852 Bolivia 1995 11.2 BOL
38046 Bolivia 1996 11.5 BOL
38240 Bolivia 1997 10.7 BOL
38434 Bolivia 1998 11.0 BOL
38628 Bolivia 1999 11.0 BOL
38822 Bolivia 2000 8.73 BOL
39016 Bolivia 2001 8.4 BOL
39210 Bolivia 2002 9.02 BOL
39404 Bolivia 2003 9.42 BOL
39598 Bolivia 2004 9.22 BOL
39792 Bolivia 2005 9.94 BOL
39986 Bolivia 2006 10.3 BOL
40180 Bolivia 2007 11.4 BOL
40374 Bolivia 2008 12.5 BOL
40568 Bolivia 2009 13.1 BOL
40762 Bolivia 2010 13.5 BOL
40956 Bolivia 2011 15.2 BOL
41150 Bolivia 2012 16.3 BOL
41344 Bolivia 2013 17.9 BOL
41538 Bolivia 2014 20.1 BOL
41732 Bolivia 2015 21.4 BOL
41926 Bolivia 2016 22.8 BOL
42120 Bolivia 2017 23.5 BOL
42314 Bolivia 2018 23.6 BOL
42508 Bolivia 2019 23.3 BOL
42702 Bolivia 2020 21.7 BOL
42896 Bolivia 2021 23.5 BOL
43090 Bolivia 2022 24.3 BOL
Data for Ecuador:
country year emissions country_code
36910 Ecuador 1990 16.9 ECU
37104 Ecuador 1991 16.4 ECU
37298 Ecuador 1992 20.9 ECU
37492 Ecuador 1993 23.5 ECU
37686 Ecuador 1994 15.0 ECU
37880 Ecuador 1995 22.6 ECU
38074 Ecuador 1996 23.2 ECU
38268 Ecuador 1997 19.4 ECU
38462 Ecuador 1998 24.6 ECU
38656 Ecuador 1999 21.4 ECU
38850 Ecuador 2000 19.5 ECU
39044 Ecuador 2001 24.2 ECU
39238 Ecuador 2002 26.8 ECU
39432 Ecuador 2003 28.6 ECU
39626 Ecuador 2004 30.8 ECU
39820 Ecuador 2005 32.6 ECU
40014 Ecuador 2006 33.7 ECU
40208 Ecuador 2007 38.2 ECU
40402 Ecuador 2008 35.4 ECU
40596 Ecuador 2009 36.6 ECU
40790 Ecuador 2010 39.3 ECU
40984 Ecuador 2011 42.8 ECU
41178 Ecuador 2012 42.3 ECU
41372 Ecuador 2013 45.4 ECU
41566 Ecuador 2014 49.2 ECU
41760 Ecuador 2015 47.1 ECU
41954 Ecuador 2016 43.7 ECU
42148 Ecuador 2017 45.5 ECU
42342 Ecuador 2018 45.0 ECU
42536 Ecuador 2019 46.7 ECU
42730 Ecuador 2020 40.6 ECU
42924 Ecuador 2021 47.1 ECU
43118 Ecuador 2022 48.6 ECU
Data for Brazil:
country year emissions country_code
36883 Brazil 1990 244.0 BRA
37077 Brazil 1991 251.0 BRA
37271 Brazil 1992 253.0 BRA
37465 Brazil 1993 261.0 BRA
37659 Brazil 1994 274.0 BRA
37853 Brazil 1995 296.0 BRA
38047 Brazil 1996 320.0 BRA
38241 Brazil 1997 340.0 BRA
38435 Brazil 1998 347.0 BRA
38629 Brazil 1999 331.0 BRA
38823 Brazil 2000 347.0 BRA
39017 Brazil 2001 340.0 BRA
39211 Brazil 2002 329.0 BRA
39405 Brazil 2003 322 BRA
39599 Brazil 2004 334 BRA
39793 Brazil 2005 348 BRA
39987 Brazil 2006 357 BRA
40181 Brazil 2007 392.0 BRA
40375 Brazil 2008 442.0 BRA
40569 Brazil 2009 412.0 BRA
40763 Brazil 2010 484.0 BRA
40957 Brazil 2011 518 BRA
41151 Brazil 2012 552 BRA
41345 Brazil 2013 589 BRA
41539 Brazil 2014 610.0 BRA
41733 Brazil 2015 546.0 BRA
41927 Brazil 2016 485.0 BRA
42121 Brazil 2017 496.0 BRA
42315 Brazil 2018 475.0 BRA
42509 Brazil 2019 475.0 BRA
42703 Brazil 2020 423.0 BRA
42897 Brazil 2021 443.0 BRA
43091 Brazil 2022 454.0 BRA
for country in countries_of_interest:
country_data = emissions_long_1990_2022.query("country == @country")
fig = px.line(
country_data, x="year", y="emissions", title=f"Emissions for {country}"
)
fig.show()